Reordering Iterations in Runtime Loop Parallelization Reordering Iterations in Runtime Loop Parallelization

نویسنده

  • John Zahorjan
چکیده

When a loop in a sequential program is parallelized, it is normally guaranteed that all ow dependencies and anti-dependencies are respected so that the result of parallel execution is always the same as sequential execution. In some cases, however, the algorithm implemented by the loop allows the iterations to be executed in a di erent sequential order than the one speci ed in the program. This opportunity can be exploited to expose parallelism that exists in the algorithm but is obscured by its sequential program implementation. In this paper, we show how parallelization of this kind of loop can be integrated into the runtime parallelization scheme of Saltz et al. [17, 18]. Runtime parallelization is a general technique appropriate for loops whose dependency structures cannot be determined at compile time. The compiler generates two pieces of code: the inspector examines dependencies at run time and computes a parallel schedule; the executor executes iterations in parallel according to the computed schedule. In our case, the inspector has to solve two problems: choosing an appropriate sequential order for the iterations and computing a parallel schedule. The two problems are treated as a single graph coloring problem, which is solved heuristically. Two methods to do so are described. Furthermore, the basic runtime parallelization scheme for shared-memory multiprocessors pays no attention to locality when scheduling iterations onto processors. One of our methods takes locality consideration into account when making these decisions. The performance implications of reordering are examined experimentally on a KSR1 parallel machine as well as through a simple analytic model of execution time.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reordering Iterations in Runtime Loop Parallelization

When a loop in a sequential program is parallelized, it is normally guaranteed that all ow dependencies and anti-dependencies are respected so that the result of parallel execution is always the same as sequential execution. In some cases, however, the algorithm implemented by the loop allows the iterations to be executed in a di erent sequential order than the one speci ed in the program. This...

متن کامل

Communication-Free Parallelization via Affine Transformations

The paper describes a parallelization algorithm for programs consisting of arbitrary nestings of loops and sequences of loops. The code produced by our algorithm yields all the degrees of communication-free parallelism that can be obtained via loop ssion, fusion, interchange, reversal, skewing, scaling, reindexing and statement reordering. The algorithm rst assigns the iterations of instruction...

متن کامل

Online Dynamic Dependence Analysis for Speculative Polyhedral Parallelization

We present a dynamic dependence analyzer whose goal is to compute dependences from instrumented execution samples of loop nests. The resulting information serves as a prediction of the execution behavior during the remaining iterations and can be used to select and apply a speculatively optimizing and parallelizing polyhedral transformation of the target sequential loop nest. Thus, a parallel l...

متن کامل

Improved Affine Partition Algorithm for Compile-Time and Runtime Performance

The Affine partitioning framework, which unifies many useful program transforms such as unimodular transformations, loop fusion, fission, scaling, reindexing, and statement reordering, has been proved to be successful in automatic discovery of the loop-level parallelization in programs. The affine partition algorithm was improved from the aspects of compile-time and runtime efficiency in this p...

متن کامل

Evaluating the Performance Potential of Function Level Parallelism

Because of technology advances, current trend in processor architecture design focuses on placing multiple cores on single chip instead of increasing the complexity of single core processors. These upcoming processors are able to execute several threads in parallel, which make them a suitable platform for the application of automatic parallelization techniques. Most of the research efforts conc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1992